Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] KV-cache compression micro_sdpa kernel #28004

Merged

Conversation

sshlyapn
Copy link
Contributor

@sshlyapn sshlyapn commented Dec 10, 2024

Details:

  • Added KV-cache compression support to the micro_sdpa kernel
  • Performance still needs to be adjusted

Tickets:

@sshlyapn sshlyapn added category: GPU OpenVINO GPU plugin do_not_merge labels Dec 10, 2024
@sshlyapn sshlyapn added this to the 2025.0 milestone Dec 10, 2024
@sshlyapn sshlyapn force-pushed the onednn_kv_cache_compression branch from ead8d8a to 5df2d53 Compare December 20, 2024 17:37
@p-durandin p-durandin marked this pull request as ready for review January 14, 2025 09:51
@p-durandin p-durandin requested review from a team as code owners January 14, 2025 09:51
@p-durandin p-durandin changed the title WIP: [GPU] KV-cache compression micro_sdpa kernel [GPU] KV-cache compression micro_sdpa kernel Jan 14, 2025
Copy link
Contributor

@vladimir-paramuzov vladimir-paramuzov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for initial version. Need further perf tuning

@p-durandin p-durandin enabled auto-merge January 16, 2025 10:25
@p-durandin p-durandin added this pull request to the merge queue Jan 16, 2025
auto-merge was automatically disabled January 16, 2025 12:56

Pull Request is not mergeable

Merged via the queue into openvinotoolkit:master with commit 3d60768 Jan 16, 2025
167 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GPU OpenVINO GPU plugin Code Freeze priority: high High piority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants